Detection of Malicious PDF Files Based on Hierarchical Document Structure
نویسندگان
چکیده
Malicious PDF files remain a real threat, in practice, to masses of computer users, even after several high-profile security incidents. In spite of a series of a security patches issued by Adobe and other vendors, many users still have vulnerable client software installed on their computers. The expressiveness of the PDF format, furthermore, enables attackers to evade detection with little effort. Apart from traditional antivirus products, which are always a step behind attackers, few methods are known that can be deployed for protection of end-user systems. In this paper, we propose a highly performant static method for detection of malicious PDF documents which, instead of analyzing JavaScript or any other content, makes use of essential differences in the structural properties of malicious and benign PDF files. We demonstrate its effectiveness on a data corpus containing about 660,000 real-world malicious and benign PDF files, both in laboratory conditions and during a 10-week operational deployment with weekly retraining. Additionally, we present the first comparative evaluation of several learning setups with regard to resistance against adversarial evasion and show that our method is reasonably resistant to sophisticated attack scenarios.
منابع مشابه
A Pattern Recognition System for Malicious PDF Files Detection
Malicious PDF files have been used to harm computer security during the past two-three years, and modern antivirus are proving to be not completely effective against this kind of threat. In this paper an innovative technique, which combines a feature extractor module strongly related to the structure of PDF files and an effective classifier, is presented. This system has proven to be more effec...
متن کاملHidost: a static machine-learning-based detector of malicious files
Malicious software, i.e., malware, has been a persistent threat in the information security landscape since the early days of personal computing. The recent targeted attacks extensively use non-executable malware as a stealthy attack vector. There exists a substantial body of previous work on the detection of non-executable malware, including static, dynamic, and combined methods. While static ...
متن کاملMalicious Pdf Document Detection Based on Feature Extraction and Entropy
In this paper we present a machine learning based approach for detection of malicious PDF documents. We identify various features in PDF documents which are used by malware authors to construct a malicious file. Based on these feature set we arrive on models which is used to detect malicious PDF documents. Based on these feature sets, detection rate is high as compared to approaches which depen...
متن کاملAdvanced Detection Tool for PDF Threats
In this paper we introduce an efficient application for malicious PDF detection: ADEPT. With targeted attacks rising over the recent past, exploring a new detection and mitigation paradigm becomes mandatory. The use of malicious PDF files that exploit vulnerabilities in well-known PDF readers has become a popular vector for targeted attacks, for which few efficient approaches exist. Although si...
متن کاملOcument D Etection B Ased on F
In this paper we present a machine learning based approach for detection of malicious PDF documents. We identify various features in PDF documents which are used by malware authors to construct a malicious file. Based on these feature set we arrive on models which is used to detect malicious PDF documents. Based on these feature sets, detection rate is high as compared to approaches which depen...
متن کامل